fix the bug for eval function while variable_update=parameter_server|distributed_replicated by pan463194277 · Pull Request #47 · tensorflow/benchmarks

pan463194277 · 2017-08-10T07:32:42Z

firstly , the _eval function currently doesn't support the mode of 'variable_update=parameter_server' and 'variable_update=distributed_replicated' ,and there will be some mistakes while using the mode of 'replicated' to restore parameters from the checkpoint file that created by training with 'variable_update=parameter_server|distributed_replicated' ,so I changed the 'target' to fix it .

secondly ,while variable_update='distributed_replicated' ,the result of eval function looks not correct. I found that the set of tf.global_variables have no parameters while restoring checkpoint , and even in training ,tf.global_variables() only contained 190+ parameters(these parameters were copied from local_variables and only trainable variables) ,without 'batchnorm/gamma' ,'batchnorm/moving_mean' and 'batchnorm/moving_variance' ,so I changed the code to store/restore parammeters from/to the tf.local_variables and it worked.

…distributed_replicated

tfboyd · 2017-08-18T03:37:56Z

@reedwm Can you take a look? I think you are dealing with this internally. I will merge internal to external to get a better version out here and would like to clean up these PRs first. Thank you.

Merge internal changes into public repository (change 181251654)

fix the bug for eval function while variable_update=parameter_server|…

3e98be2

…distributed_replicated

tfboyd requested a review from reedwm August 18, 2017 03:37

freedomtan pushed a commit to freedomtan/benchmarks that referenced this pull request Apr 18, 2018

Merge pull request tensorflow#47 from tensorflow/internal-to-github-sync

d10a0f2

Merge internal changes into public repository (change 181251654)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the bug for eval function while variable_update=parameter_server|distributed_replicated#47

fix the bug for eval function while variable_update=parameter_server|distributed_replicated#47
pan463194277 wants to merge 1 commit intotensorflow:masterfrom
pan463194277:benchmark_bugfix1

pan463194277 commented Aug 10, 2017

Uh oh!

tfboyd commented Aug 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pan463194277 commented Aug 10, 2017

Uh oh!

tfboyd commented Aug 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants